Search CORE

Seamless Coarse Grained Parallelism Integration in Intensive Bioinformatics Workflows

Author: Moreews Francois
Lavenier Dominique
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/05/2016
Field of study

To be easily constructed, shared and maintained, complex in silico bioinformatics analysis are structured as workflows. Furthermore, the growth of computational power and storage demand from this domain, requires workflows to be efficiently executed. However, workflow performances usually rely on the ability of the designer to extract potential parallelism. But atomic bioinformatics tasks do not often exhibit direct parallelism which may appears later in the workflow design process. In this paper, we propose a Model-Driven Architecture approach for capturing the complete design process of bioinformatics workflows. More precisely, two workflow models are specified: the first one, called design model, graphically captures a low throughput prototype. The second one, called execution model, specifies multiple levels of coarse grained parallelism. The execution model is automatically generated from the design model using annotation derived from the EDAM ontology. These annotations describe the data types connecting differents elementary tasks. The execution model can then be interpreted by a workflow engine and executed on hardware having intensive computation facility

Almae Matris Studiorum Campus

Quality metrics for benchmarking sequences comparison tools

Author: Drezen Erwan
Lavenier Dominique
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

International audienceComparing sequences is a daily task in bioinformatics and many software try to fulfill this need by proposing fast execution times and accurate results. Introducing a new software in this field requires to compare it to recognized tools with the help of well defined metrics. A set of quality metrics is proposed that enables a systematic approach for comparing alignment tools. These metrics have been implemented in a dedicated software, allowing to produce textual and graphical benchmark artifacts

Crossref

GASSST: global alignment short sequence search tool

Author: Lavenier Dominique
Rizk Guillaume
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold—achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads

CiteSeerX

PubMed Central

Parallelization of the K-means algorithm on a reconfigurable system: Application to hyper-spectral images

Author: LAVENIER (Dominique)
Publication venue: GRETSI, Saint Martin d'Hères, France
Publication date: 01/01/2001
Field of study

The article presents a parallel architecture dedicated to the k-means algorithm used for clustering objects in a non hierachical way. We propose an implementation on a reconfigurable system made of a PC and a FPGA board closely coupled through the I/O bus. Experimentations have been carried on a set of hyper-spectral images and show that the computation time is reduced from a few hours to a few minutes. We also points out the influence of the channel quality between the processor and the FPGA board over the global system performances.L'article présente une architecture parallèle spécialisée pour l'algorithme du K-means, un algorithme de classification qui regroupe les objets de manière non hiérarchique. Nous proposons une mise en oeuvre sur un système reconfigurable composé d'un PC couplé à une carte FPGA par l'intermédiaire de son bus d'entrées/sorties. L'expérimentation porte sur la segmentation d'images hyper-spectrales et démontre que les temps de traitement peuvent être réduits de quelques heures à quelques minutes. Cette réalisation met également en évidence l'influence de la qualité de la liaison entre le processeur et la carte FPGA sur les performances globales du système

I-Revues

Un Réseau systolique intégré pour la correction de fautes de frappe

Author: Lavenier Dominique
Publication venue: HAL CCSD
Publication date: 01/01/1992
Field of study

Ce rapport présente la réalisation d'un circuit VLSI spécialisé pour la correction de fautes de frappe. L'architecture du circuit est basée sur une structure réguliere, un réseau systolique bidimensionnel de 69 processeurs. La méthodologie suivie pendant la conception du circuit tire profit de cette regularité, notamment pendant les phases de validation

arXiv.org e-Print Archive

Multiple Comparative Metagenomics using Multiset k-mer Counting

Author: Benoit Gaëtan
Drezen Erwan
Lavenier Dominique
Lemaitre Claire
Mariadassou Mahendra
Peterlongo Pierre
Schbath Sophie
Publication venue
Publication date: 28/04/2016
Field of study

Background. Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods. These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today's metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results. Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads). We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling